Genomic Distribution of Pro-Virulent cpdB-like Genes in Eubacteria and Comparison of the Enzyme Specificity of CpdB-like Proteins from Salmonella enterica, Escherichia coli and Streptococcus suis

The cpdB gene is pro-virulent in avian pathogenic Escherichia coli and in Salmonella enterica, where it encodes a periplasmic protein named CpdB. It is structurally related to cell wall-anchored proteins, CdnP and SntA, encoded by the also pro-virulent cdnP and sntA genes of Streptococcus agalactiae and Streptococcus suis, respectively. CdnP and SntA effects are due to extrabacterial hydrolysis of cyclic-di-AMP, and to complement action interference. The mechanism of CpdB pro-virulence is unknown, although the protein from non-pathogenic E. coli hydrolyzes cyclic dinucleotides. Considering that the pro-virulence of streptococcal CpdB-like proteins is mediated by c-di-AMP hydrolysis, S. enterica CpdB activity was tested as a phosphohydrolase of 3′-nucleotides, 2′,3′-cyclic mononucleotides, linear and cyclic dinucleotides, and cyclic tetra- and hexanucleotides. The results help to understand cpdB pro-virulence in S. enterica and are compared with E. coli CpdB and S. suis SntA, including the activity of the latter on cyclic-tetra- and hexanucleotides reported here for the first time. On the other hand, since CpdB-like proteins are relevant to host-pathogen interactions, the presence of cpdB-like genes was probed in eubacterial taxa by TblastN analysis. The non-homogeneous genomic distribution revealed taxa with cpdB-like genes present or absent, identifying eubacteria and plasmids where they can be relevant.


Introduction
The CpdB protein was first identified as a periplasmic protein encoded by the cpdB gene of Escherichia coli with attributed 3 -nucleotidase and 2 ,3 -cyclic mononucleotide (2 ,3 -cNMP) phosphodiesterase activities [1][2][3]. These activities have been observed also in other Gram-negative bacteria [4][5][6]. After the expression of E. coli CpdB as a recombinant protein, it was characterized as a highly efficient enzyme, also active on cyclic and linear dinucleotides (c-di-NMP and pNpN) [7,8]. In Gram-positives, two CpdB-like cell-wallattached proteins named CdnP and SntA have been studied in Streptococcus agalactiae [9] and Streptococcus suis [10], respectively. They are structurally related to CpdB and also display activities as 3 -nucleotidases and phosphodiesterases of 2 ,3 -cNMP, and linear or cyclic 3 ,5 -dinucleotides.
The genes encoding CpdB-like proteins have been recognized as pro-virulent factors of several pathogens. The cpdB gene of avian pathogenic E. coli has been reported to favor the S. enterica controls the transition between biofilm and virulence, and modulates virulence phenotypes: it is a regulator of the sessility versus motility transition, with high levels favoring biofilm formation and inhibiting, for instance, the invasiveness of intestinal epithelial cells [29,[37][38][39][40]. It is known that STING-dependent type-I interferon response of infected host cells can be evoked also by c-di-GMP, including infections by S. enterica [18,29,41,42]. For instance, in dendritic cells, c-di-GMP triggers innate immunity mediated by STING and induces T H 17 cells [29,42].
In this study, based on what is known about cpdB-like bacterial genes and their encoded proteins, we posed two questions (see Scheme 1). Since the hypothetical hydrolytic activity of S. enterica CpdB towards cyclic dinucleotides could be relevant to host-pathogen interaction, an extensive study of its so-far unknown enzyme specificity was performed using a recombinant enzyme cloned from serovar Typhimurium genomic DNA. Among other things, we concluded that its detected activity on extracellular c-di-GMP could be the basis for the pro-virulent character of the cpdB gene of this species. However, intrigued by a casual observation that 40% of the sequenced genomes of S. enterica serovar Typhimurium do not contain a cpdB gene, we also analyzed the genomic distribution of cpdB-like genes in eubacteria, to explore the extent to which these genes could participate in host-pathogen interactions in different eubacterial taxa. It was concluded that cpdB-like genes are not ubiquitous among bacterial taxa and that, within many taxa where they occur, their distribution is not homogeneous, even at the level of species in many cases. This opens a new global perspective on the role of these genes. c-di-GMP could be the basis for the pro-virulent character of the cpdB gene of this species. However, intrigued by a casual observation that 40% of the sequenced genomes of S. enterica serovar Typhimurium do not contain a cpdB gene, we also analyzed the genomic distribution of cpdB-like genes in eubacteria, to explore the extent to which these genes could participate in host-pathogen interactions in different eubacterial taxa. It was concluded that cpdB-like genes are not ubiquitous among bacterial taxa and that, within many taxa where they occur, their distribution is not homogeneous, even at the level of species in many cases. This opens a new global perspective on the role of these genes.

Scheme 1.
Questions addressed in this study and summary of conclusions.

Enzyme Characterization of S. enterica CpdB Protein
This CpdB protein, devoid of its signal sequence, was overexpressed in BL21 cells from plasmid pGEX-6P-3-S.enter_CpdB which encodes a fusion protein GST-CpdB. The recombinant enzyme present in cell lysate supernatants was purified by affinity to GSH-Sepharose and recovered free of the GST tag by specific proteolysis with PreScission protease. An extensive enzymatic characterization was performed using 3′-nucleotides, 2′,3′-cNMP, linear dinucleotides, cyclic di-, tetra-and hexanucleotides, among other substrates. The cyclic oligoadenylates c-tetra-AMP and c-hexa-AMP are second messengers produced by type III CRISPR-Cas systems [43], and there is little data about phosphodiesterases hydrolyzing them other than the so-called ring nucleases [44][45][46]. So far, c-tetra-AMP and c-hexa-AMP had not been tested as substrates of CpdB-like enzymes.
With all the substrates of CpdB, except 2′,3′-cGAMP, saturation kinetics were studied by assaying initial rates of phosphohydrolysis at different substrate concentrations. Estimations of kcat, KM and catalytic efficiency (kcat/KM) were obtained by non-linear regression of the Michaelis-Menten equation to the experimental data. In the case of 2′,3′-cGAMP, the catalytic efficiency was estimated from initial-rate assays directly proportional to substrate concentration. The results are shown in Table 1 in order of decreasing catalytic efficiencies. Scheme 1. Questions addressed in this study and summary of conclusions.

Enzyme Characterization of S. enterica CpdB Protein
This CpdB protein, devoid of its signal sequence, was overexpressed in BL21 cells from plasmid pGEX-6P-3-S.enter_CpdB which encodes a fusion protein GST-CpdB. The recombinant enzyme present in cell lysate supernatants was purified by affinity to GSH-Sepharose and recovered free of the GST tag by specific proteolysis with PreScission protease. An extensive enzymatic characterization was performed using 3 -nucleotides, 2 ,3 -cNMP, linear dinucleotides, cyclic di-, tetra-and hexanucleotides, among other substrates. The cyclic oligoadenylates c-tetra-AMP and c-hexa-AMP are second messengers produced by type III CRISPR-Cas systems [43], and there is little data about phosphodiesterases hydrolyzing them other than the so-called ring nucleases [44][45][46]. So far, c-tetra-AMP and c-hexa-AMP had not been tested as substrates of CpdB-like enzymes.
With all the substrates of CpdB, except 2 ,3 -cGAMP, saturation kinetics were studied by assaying initial rates of phosphohydrolysis at different substrate concentrations. Estimations of k cat , K M and catalytic efficiency (k cat /K M ) were obtained by non-linear regression of the Michaelis-Menten equation to the experimental data. In the case of 2 ,3 -cGAMP, the catalytic efficiency was estimated from initial-rate assays directly proportional to substrate concentration. The results are shown in Table 1 in order of decreasing catalytic efficiencies.
The catalytic efficiencies for the substrates tested ranged from very high (>10 7 M −1 s −1 ; near the diffusion rate limit) for 3 -nucleotides, 2 ,3 -cNMP and the linear dinucleotide pApA, to low values (<10 3 M −1 s −1 ) for c-tetra-AMP, c-hexa-AMP, 3 ,5 -cAMP, NDPhexoses, 2 ,3 -cGAMP, 5 -AMP and 2 -AMP. Actually, with the two latter compounds no activity was detected, highlighting the strict specificity of the enzyme for 3 -nucleotides. Between the two extremes of catalytic efficiency, there are twelve intermediate substrates with catalytic efficiencies ranging 10 6 -10 4 M −1 s −1 . Among them, pGpG, 3 ,3 -cGAMP, c-di-AMP and c-di-GMP are relevant, together with pApA, to the role of cyclic dinucleotides as possible intermediates in the interferon response of the infected host. Although they are clearly worse substrates than 3 -nucleotides and 2 ,3 -cNMP, they cannot be disregarded because catalytic efficiencies of 10 6 -10 4 M −1 s −1 are around the average value of catalytic efficiencies in the enzyme universe (≈10 5 M −1 s −1 ) [47]. 1 k cat and K M were calculated from saturation curves obtained at different concentrations of substrate; the catalytic efficiencies were calculated by dividing k cat /K M or, when these parameters were not available (2 ,3 -cGAMP), by the procedure described in Section 3.2. 2 In assays at a fixed 500 µM concentration, the activities on 5 -AMP and 2 -AMP represented less than 0.0006% of the activity on 3 -GMP and less than 2% of the activity on 3 ,5 -cAMP. Nd: not determined.
Taking into account the periplasmic location of CpdB, one would expect that it targets extracytoplasmic c-di-GMP. In this context, the hydrolysis of c-di-GMP by the periplasmic CpdB of S. enterica, followed by the degradation of the pGpG product by the same enzyme, could explain the pro-virulence of the cpdB gene [12]. Removal of the secreted dinucleotide would hinder host immune response, a defense strategy similar to that followed by streptococci through the hydrolysis of bacterial secreted c-di-AMP to diminish the innate response of the infected host cells [9,10].
Another interesting aspect of CpdB is related to its high activity towards 2 ,3 -cNMP. These compounds are formed by RNase I, an enzyme present in bacterial cytosol and periplasmic space [48][49][50][51]. Therefore, at least in the latter case, 2 ,3 -cNMP formed by RNase I would be hydrolyzed by periplasmic CpdB. Recently, 2 ,3 -cNMP have been proposed as a novel class of bacterial signals [50][51][52][53]. In E. coli, they have clear physiological effects on gene expression, flagellar motility, biofilm formation and acid tolerance. In S. enterica, despite the evolutionary closeness with E. coli, the response to 2 ,3 -cNMP is quite different. To begin with, out of the many genes that are dysregulated upon 2 ,3 -cNMP depletion, only two of them show consistent changes in both species. In general, it can be said that there is little overlap in the respective cellular responses [51]. Anyhow, the possible physiological impacts of extracytoplasmic 2 ,3 -cNMP, and of their hydrolysis by periplasmic CpdB are unknown.

Comparisons of CpdB-like Enzyme Specificities of Different Bacteria
Besides the Table 1 data, there are two published reports of detailed substrate specificity of CpdB-like enzymes with kinetic parameters, one for E. coli CpdB [7] and the other for S. suis SntA [10]. Figure 1 presents the comparison of S. enterica CpdB with S. suis SntA, while S. enterica CpdB is compared to E. coli CpdB in Figure 2. A direct comparison between S. suis SntA and E. coli CpdB can be found elsewhere [10].
The comparison of S. enterica CpdB with S. suis SntA in terms of k cat /K M (Figure 1c) revealed two substrate groups depending on the ratio of catalytic efficiencies between both enzymes being higher or lower than unity. The enzyme from S. enterica was less efficient than that from S. suis for 10 substrates, including cyclic oligonucleotides, particularly for c-hexa-AMP and c-di-AMP, and (much) less markedly for 2 ,3 -cGAMP, c-di-GMP, 3 ,3 -cGAMP and c-tetra-AMP. In all these cases, the lesser efficiency of S. enterica CpdB was generally related to higher K M values (with the exception of c-tetra-AMP; Figure 1b) and lower k cat values (except for c-di-GMP and 3 ,3 -cGAMP; Figure 1a). On the other hand, for the other 16 substrates, the enzyme from S. enterica was more efficient than that from S. suis ( Figure 1c) generally related to lower K M ( Figure 1b) and higher k cat values ( Figure 1a).
The results of the above comparison are similar to those obtained when S. suis SntA is compared to E. coli CpdB [10], although in this case less substrates were available. The differences of SntA versus E. coli CpdB, were more marked than versus S. enterica CpdB. This reflects better in the direct comparison between both CpdB enzymes (Figure 2), where all the substrates that could be compared were more efficiently hydrolyzed by the enzyme from S. enterica (Figure 2c), particularly so with the linear dinucleotides, reflecting higher k cat and lower K M values with a few exceptions (Figure 2a,b).

Structural Comparison of CpdB-like Proteins of Different Bacteria
The specificity differences among the three CpdB-like enzymes studied should be the consequence of sequential/structural differences among the proteins. The protein alignment of Figure 3 displays separately the differences between S. suis Snta and S. enterica CpdB (above the alignment), and those between E.coli CpdB and S. enterica CpdB (below the alignment). There are many differences between the sequences of S. suis SntA and S. enterica CpdB. The former is 813 amino acids long, and in the alignment only 283 of them are identical. Within the parts that align with S. enterica CpdB, there are several gaps either in SntA or CpdB. The amino acid sequences of the CpdB proteins from S. enterica and E. coli are 90.3% identical. Both proteins are 647 amino acids long, and 584 of them are identical in the alignment. They align without any gap. The differences ( Figure 3) should be responsible for the different specificity of SntA versus CpdB (Figure 1c), and for the higher efficiency of S. enterica CpdB compared to E. coli CpdB (Figure 2c).
Currently, there are no crystal structures available for any of the three proteins considered, and within the AlphaFold Protein Structure Database [54,55] there is a model only for S. enterica CpdB (UniProt ID P26265; AF-P26265-F1-model_v4.pdb). So, to evaluate possible structural differences among the three proteins, we used homology models of E. coli CpdB and S. suis SntA prepared using the AlphaFold structure of S. enterica CpdB as the template. The homology models were obtained in the Phyre2 server [56].  Table 1, while those for S. suis were from previous work [10], except for c-tetra-AMP and c-hexa-AMP (Table 1). The bars represent parameter ratios in logarithmic scale: (a) ratios of kcat values; (b) ratios of KM values; (c) ratios of kcat/KM values. Blank columns in panels (a,b) indicate absence of the corresponding parameter (kcat or KM) for S. enterica and/or S. suis SntA. In the three panels, the substrates are ordered from higher to lower ratios of catalytic efficiency (kcat/KM).   [7]. The bars represent parameter ratios in logarithmic scale: (a) ratios of kcat values; (b) ratios of KM values; (c) ratios of kcat/KM values. In the three panels, the substrates are ordered from higher to lower ratios of catalytic efficiency.  Table 1, while those for E. coli were from previous work [7]. The bars represent parameter ratios in logarithmic scale: (a) ratios of k cat values; (b) ratios of K M values; (c) ratios of k cat /K M values. In the three panels, the substrates are ordered from higher to lower ratios of catalytic efficiency. Figure 3. Sequence alignment of the proteins used in this study. The alignment was prepared with Clustal Omega online (https://www.ebi.ac.uk/Tools/msa/clustalo/; accessed 5 February 2023) with a few manual edits. The sequences correspond to NCBI Protein accessions: Ssui, AYV64543; Sent, P26265; Ecol, AKS04560 with the addition of the signal sequence. The symbol = above Ssui sequence and below Ecol one indicates identity with the Sent sequence. The recombinant proteins studied start and finish in the amino acids marked with arrowheads. The sequences within boxes correspond to the interdomain linkers between the N-terminal "metallophos" and the C-terminal "5_nucleotid_C" domains (see the structures shown in Figure 4). Bold-type amino acids in the Sent sequence are either those coordinated (in the metallophos domain) with the metal ions or located (in the 5_nucleotid_C domain) at ≤4 angstrom of a 3′-AMP substrate modeled in the active site of E. coli CpdB. Shadowed in blue are His117, which has a catalytic role in enzymes of the metallophosphatase family, Tyr440 and Tyr544, which form a sandwich with the nitrogen base of sub- Figure 3. Sequence alignment of the proteins used in this study. The alignment was prepared with Clustal Omega online (https://www.ebi.ac.uk/Tools/msa/clustalo/; accessed 5 February 2023) with a few manual edits. The sequences correspond to NCBI Protein accessions: Ssui, AYV64543; Sent, P26265; Ecol, AKS04560 with the addition of the signal sequence. The symbol = above Ssui sequence and below Ecol one indicates identity with the Sent sequence. The recombinant proteins studied start and finish in the amino acids marked with arrowheads. The sequences within boxes correspond to the interdomain linkers between the N-terminal "metallophos" and the C-terminal "5_nucleotid_C" domains (see the structures shown in Figure 4). Bold-type amino acids in the Sent sequence are either those coordinated (in the metallophos domain) with the metal ions or located (in the 5_nucleotid_C domain) at ≤4 angstrom of a 3 -AMP substrate modeled in the active site of E. coli CpdB. Shadowed in blue are His117, which has a catalytic role in enzymes of the metallophosphatase family, Tyr440 and Tyr544, which form a sandwich with the nitrogen base of substrates like 3 -AMP [8,10]. Red lines above the Ssui sequence, mark the most significant differences found with respect to the Sent sequence, most of them related to alignment gaps in either sequence. Amino acids in red type in the Ecol sequence mark the differences with respect to the Sent sequence. All these differences are highlighted in red in the structures shown in Figure 4a,c. strates like 3′-AMP [8,10]. Red lines above the Ssui sequence, mark the most significant differences found with respect to the Sent sequence, most of them related to alignment gaps in either sequence. Amino acids in red type in the Ecol sequence mark the differences with respect to the Sent sequence. All these differences are highlighted in red in the structures shown in Figure 4a,c. enterica CpdB (taken from [8]; it fits equally well in the other proteins, not shown). The dimetallic centers of the three proteins are shown (grey spheres). The amino acid side chains configuring the substrate binding center in the 5_nucleotid_ C domain (colored in green), and those coordinated to the metal ions in the metallophos domain (colored in blue) are all sequentially and spatially conserved in the three proteins. In the SntA structure (a), parts colored in red (except A714-S715) are those showing differences of structure with respect to CpdB (marked also in Figure 3), and those colored in orange depict the parts of CpdB which are substituted in SntA. In the E. coli CpdB structure (c), parts colored in red indicate differences of sequence with respect to S. enterica CpdB (marked also in Figure 3). For comments on s1-s10 and other labels, see the text of Section 2.3. In the E. coli CpdB structure (c), parts colored in red indicate differences of sequence with respect to S. enterica CpdB (marked also in Figure 3). For comments on s1-s10 and other labels, see the text of Section 2.3.
To analyze how the differences among the three proteins can have some bearing on their specificity and catalytic efficiency, it is necessary to consider the dynamic events occurring during the catalytic cycle of the metallophosphatases that contain a 5_nucleotid_C domain (Scheme 2). This is inferred from detailed studies of the 5 -nucleotidase UshA [57][58][59][60][61][62] and recently it has been extrapolated to CpdB [8]. The 5_Nucleotid_C domain contains the substrate-binding pocket, which in the "open" conformation faces the medium. After substrate binding, this domain undergoes a 96 • rotation towards the "closed" conformation, bringing the scissible linkage of the substrate to the catalytic dimetallic site of the metallophos domain where phosphohydrolysis takes place. This is the conformation shown in the models of Figure 4.
for the higher efficiency of S. enterica CpdB compared to E. coli CpdB (Figure 2c).
Currently, there are no crystal structures available for any of the three proteins considered, and within the AlphaFold Protein Structure Database [54,55] there is a model only for S. enterica CpdB (UniProt ID P26265; AF-P26265-F1-model_v4.pdb). So, to evaluate possible structural differences among the three proteins, we used homology models of E. coli CpdB and S. suis SntA prepared using the AlphaFold structure of S. enterica CpdB as the template. The homology models were obtained in the Phyre2 server [56].
To analyze how the differences among the three proteins can have some bearing on their specificity and catalytic efficiency, it is necessary to consider the dynamic events occurring during the catalytic cycle of the metallophosphatases that contain a 5_nucleotid_C domain (Scheme 2). This is inferred from detailed studies of the 5′-nucleotidase UshA [57][58][59][60][61][62] and recently it has been extrapolated to CpdB [8]. The 5_Nucleotid_C domain contains the substrate-binding pocket, which in the "open" conformation faces the medium. After substrate binding, this domain undergoes a 96° rotation towards the "closed" conformation, bringing the scissible linkage of the substrate to the catalytic dimetallic site of the metallophos domain where phosphohydrolysis takes place. This is the conformation shown in the models of Figure 4.  The differences of sequence between S. suis SntA and S. enterica CpdB are too many to warrant a systematic analysis of all of them (there are 364 different amino acids within the aligned regions in Figure 3). Therefore, attention was centered on the gaps arising in the alignment: 17 gaps in the SntA sequence and 6 gaps in the CpdB one. They are marked by upper red lines in the SntA sequence ( Figure 3) and colored in red in the 3D model ( Figure 4a; s1-s10). Related to these gaps, the SntA model presents structural variations with respect to CpdB, as can be confirmed by careful comparison of Figure 4a with Figure 4b. This is underscored in Figure 4a by representing colored in orange the parts of CpdB that do not overlap with SntA.
Most of the structural differences between SntA and the CpdB proteins are located in the 5_Nucleotid_C domain (s4-s10; Figure 4a), which is responsible for substrate binding in the open conformation (not shown), and undergoes the large rotation needed to bring the substrate to the catalytic site (Scheme 2). Several of the structural differences occur in regions near the active site in the closed conformation (s2, s5, s6 and s7), or near the region where twisting occurs during the rotation (s3, s5, s10). The most conspicuous difference is the one marked as s3, which affects amino acids 419-424 of S. suis SntA, that in S. enterica and E. coli CpdB proteins are substituted by amino acids 322-332 which include two lysine residues (Lys 327 and Lys 328 ) absent in SntA. In CpdB proteins, this structural variation is associated with the presence of two aspartates (Asp 634 and Asp 636 ) which are also different in SntA (Ala 714 and Ser 715 ). As can be seen in Figure 4b, Lys 327 (in the metallophos domain) and the two aspartates (in the 5_Nucleotid_C domain) may establish an electrostatic interaction during rotation of the latter domain. This could retard the full closing of the active site of the CpdB proteins and at least partly explain some kinetic differences of efficiency (Figure 1). This analysis is complicated by the variety of substrates hydrolyzed by the enzymes, and by the possibility that the "closed" conformation is not the same with substrates of different sizes, e.g., 3 -nucleotides and cyclic dinucleotides.
Despite the 63 non-identical amino acids in the sequences of S. enterica and E. coli proteins, their 3D structures were practically undistinguishable in the overlapped models (not shown but compare Figure 4b with Figure 4c). Therefore, we centered our analysis on the differential sequences, which are highlighted in red both in the E. coli CpdB sequence ( Figure 3) and the 3D model (Figure 4c). None of these variations appears close enough to the active site in the "closed" conformation to explain the higher efficiency shown by S. enterica CpdB (Figure 2c). However, one of the sequence differences (marked as Q 350 in Figure 4c) is located in the region of the interdomain linker that twists during the large rotation suffered by the 5_Nucleotid_C domain to bring the substrate towards the catalytic site in the metallophos domain (see Scheme 2). The difference is a substitution of Gln 350 in E. coli CpdB by Glu 350 in S. enterica CpdB. It is conceivable that the negative charge favors the rotation and makes it occur more quickly. This would justify the larger k cat values observed with many substrates, but it explains neither why this does not occur with all, nor the differences of K M (Figure 2a,b). Similar reasoning can be applied to other differences near the Q 350 mark in Figure 4c: I 324 , N 326 , E 339 , T 340 , Y 421 , R 428 and S 569 , since they are located near the region twisted during the rotation of the 5_Nucleotide_C domain, and also interesting is G 186 , not far from the space occupied by substrates in the closed conformation. In E. coli CpdB, they represent significant substitutions with respect to S. enterica CpdB: Gly186Ile, Ile324Ala, Asn326Ala, Glu339Gly, Thr340Ile, Tyr421Phe, Arg428Gln and Ser569Ala. All of these substitutions imply differences of charge, polarity, hydrophobicity and/or size in the side chain at those positions.
Altogether, the structural dataset provided by this study paves the way for future studies of mutagenesis to elucidate the molecular basis of the differential specificity and catalytic efficiency of the three CpdB-like proteins compared.

Genomic Distribution of cpdB-like Genes in Eubacteria
To perform a systematic study of this distribution, the strategy explained in Materials and Methods Section 3.4 was applied to the Bacteria taxa of the NCBI Taxonomy browser [63] at different levels (Tables 2-6). TblastN analyses [64,65] were run using S. enterica CpdB (accession number P26265) as the query, with the score and query coverage limits indicated. A score limit of 150 was chosen taking into account the occurrence of CpdB-like homologs named 5 -nucleotidase/UDP-sugar hydrolase (UshA) [66,67], with a two-domain structure similar to CpdB. In BlastP comparisons, most UshA proteins align with P26265 with scores < 130 (as compared to scores > 1000 for alignments between CpdB proteins from different bacteria). Nevertheless, the limit of score 150 is somewhat arbitrary, as one cannot totally rule out that some true cpdB relatives align with P26265 with lower scores, while choosing a lower limit to avoid this would count some ushA genes as cpdB-like. The borderline hits in every Bacteria phylum (Table 2); when tested by BlastP, it showed a (much) better alignment score with CpdB than with UshA. In a few cases that this was not so, the affected hits were removed (see footnotes 4 and 3 of Tables 2 and 3, respectively). The limit of 70% query coverage was chosen to ensure that the two domains typical of CpdB are covered by the alignment. In principle, the search was performed among genome "sequences from type material" [68], but in some cases this restriction was removed (see below).   Those phyla without any fully sequenced genome of type material were not included in the table. 2 A "Genome hit" corresponds to a TblastN alignment with score > 150 and query coverage > 70% in the Complete Genomes database (limited to "sequences from type material", records without "plasmid" in the title, and the indicated taxonomy ID). "Total" refers to the number of sequences in the database with the same limits. Data in parenthesis were obtained by removing the search limit "sequences from type material". Access on 13-23 December 2022. The numbers vary slightly depending on accession date. 3 The key indicates, in a somewhat subjective manner, the presence of cpdB-like genes in the bacterial taxa: W, widespread; P, partial; N, negative; m, mainly (as a modifier of W or N; indicating that there are a few exceptions and/or there is only a single genome available). 4 In these cases, one hit with score 151 (in Actinobacteria) or 152 (in Firmicutes) was removed as it showed higher BlastP alignment scores with UshA than with CpdB (see the text).        Another point one should be aware of is that some organisms rather than, or in addition to having separate proteins CpdB and UshA, may express a natural fusion of both, as the result of two-gene fusion [69]. Such a protein was experimentally observed and characterized in Bacillus subtilis [70], and it is detected mainly in sequenced genomes of phylum Firmicutes (classes Bacilli and Clostridia). Of course, the fused genes were counted as cpdB-like, since P26265 aligns well with their cpdB moiety, and no attempt to correct this was performed. Among other things, the CpdB-UshA natural fusions may be enzymatically active [70].
Following the described search strategy and limits, out of 83,531 sequences of complete genomes of Bacteria (NCBI:txid2), 1772 gave significant TblastN alignments with S. enterica CpdB, and 984 aligned with score > 150 and query coverage > 70%. In contrast, the superkingdom Archaea (NCBI:txid2157) gave no significant alignments with the same limits. In Tables 2-6, the near one thousand cpdB-like genes found in Bacteria are shown distributed among taxonomical groups according to different levels of classification. Results obtained at the level of phylum or groups of phyla are shown in Table 2, where all the well-established phyla are included except those for which, at the time of running the final search (15 December 2022), sequenced genomes of type material were not available. Further exploration was run at the level of class, only within the phyla Proteobacteria and Firmicutes (Table 3). Thereafter, results at the level of order were obtained only for those belonging to classes Gammaproteobacteria and Bacilli (Table 4), and results at the level of family only for those belonging to orders Enterobacterales and Lactobacillales (Table 5). Finally, an extensive selection of specific examples of pathogens of clinical interest is included in Table 6. Interestingly, the genomic distribution of cpdB-like genes among the genomes of Bacteria was not homogeneous, as indicated in Tables 2-6 by qualification keys "N" (Negative), "Nm" (Negative, mainly), "P" (Partial), "Wm" (Widespread, mainly) and "W" (Widespread).
Let us consider first the results obtained at the level of phyla ( Table 2). The presence of cpdB-like genes was clear in Proteobacteria, Firmicutes, Deinococcus-Thermus, Spirochaetes, Thermotogae, Actinobacteria and the FCB group of phyla. In none of them the presence was widespread, only partial, meaning that, out of the tens to hundreds of sequenced genomes of type material for each of those phyla, between 11% and 51% gave hits indicative of cpdB-like genes. A wide range of scores was obtained, from near the limit of 150 to high values, which were higher in Proteobacteria than in the other cases (an expected result as the query is a protein from a Proteobacteria species; see below). In addition, the phyla Coprothermobacterota and Calditrichaeota, with a single sequenced genome each, contained a low-score hit. The rest of the phyla were either mainly negative, giving 1-2 hits with low scores in 5-221 sequenced genomes, or fully negative in 1-51 sequenced genomes. For all the phyla that gave only 0-2 hits in the available sequenced genomes of type material, the search was extended to additional genomes sequenced by removing the limit to type material (data also shown in Table 2). This revealed a small number of additional hits that did not modify the qualification key of the genomic distribution for any phylum.
In summary, out of the 27 phyla or groups of phyla with complete genomes available, cpdB-like genes are absent or near absent in 18, and present in the other 9 phyla. In the latter, the distribution is partial not homogeneous, with some genomes containing a cpdB-like gene and others not, except for two phyla with only one genome sequenced.
Further exploration of cpdB-like genes at levels lower than phylum was centered on Proteobacteria and Firmicutes, where there are many type-material genomes sequenced that gave hits in 30% and 39% of cases, respectively, with many high scores. There were 139 hits with score > 1000 in Proteobacteria, and 72 hits with scores > 500 in Firmicutes. The difference depends on the sequential differences between genes coding for enzymes either periplasmic (such as S. enterica CpdB) or cell wall-bound (such as S. suis SntA). When the TblastN was repeated using SntA sequence (accession AYV64543) as the query, the scores were higher for Firmicutes than for Proteobacteria. It may be remarked that the CpdB-like enzymes compared in Section 2.1 and 2.2 come either from Proteobacteria Enterobacteriaceae fam. (S. enterica and E. coli), or from Firmicutes Streptococcaceae fam. (S. suis).
In Table 3, phyla Proteobacteria and Firmicutes are subdivided into classes that also showed a non-ubiquitous and non-homogeneous distribution of cpdB-like genes. In Proteobacteria, 5-42% of the type-material genomes of Gammaproteobacteria, Betaproteobacteria, Alphaproteobacteria and Delta/epsilon subdivisions gave hits with high scores. In Firmicutes, Bacilli, and Clostridia gave hits in 22% and 51% of the genomes, respectively. The other Proteobacteria and Firmicutes classes were mainly negative or just negative, except Erysipelotrichia, with a partial distribution of cpdB-like genes with very low scores, and Limnochordia, with a moderately high score in a single genome sequenced.
In Table 4, the orders pertaining to classes Gammaproteobacteria and Bacilli were analyzed. Here, for the first time in the course of the TblastN analysis, appeared a taxonomical level with 100% of the type-material genomes with hits (except some taxonomical levels with a single genome sequenced), namely the order Pasteurellales. In this case, repetition of the TblastN without the limit "sequences from type material" gave 86% of the total genomes with hits (460/532). In addition, the order Enterobacterales gave hits in 92% of the type-material genomes, while Moraxellales, Vibrionales, Alteromonadales, Oceanospirillales, Cellvibrionales, Aeromonadales, Xanthomonadales, Orbales, Bacillales and Lactobacillales gave hits in 9% to 67% of the type-material genomes. The rest of orders were mainly negative or just negative, including Pseudomonadales and Legionellales.
In Table 5, the families pertaining to orders Enterobacterales and Lactobacillales were analyzed. Among Enterobacterales, the families Morganellaceae, Enterobacteriaceae, Yersiniaceae and Pectobacteriaceae showed very near to widespread distribution of cpdB-like genes, whereas Erwiniaceae, Hafniaceae and Budviciaceae displayed a partial distribution. Bruguierivoracaceae fam. was the only one with clearly negative results. Among Lactobacillales, all the families exhibited a partial distribution.
In Table 6, selected examples are shown, at the level of species, groups of species, or genus, of clinically relevant bacteria that either contain or do not contain a cpdB-like gene. In this case, TblastN analyses were always run without the limit "sequences from type material"; therefore, the results include all the available complete genomes for each species. At this level, 34 species or groups showed a completely or mainly widespread distribution of cpdB-like genes, i.e., they were present in 100% or near 100% of the genomes; 10 species showed a partial distribution, with some genomes containing and others not containing cpdB-like genes in the same species; and 28 species were negative or mainly negative as they were devoid of cpdB-like genes in 100% or near 100% of the genomes.
Let us discuss now what would be the repercussions of the three kinds of gene distribution found, taking into account that those from E. coli, S. enterica, S. agalactiae and S. suis are provirulent in different organisms [9][10][11]14]. Both the presence and the absence of cpdB-like genes in the genome can be relevant (although not exclusively, of course) for the virulence degree of the pathogen.
First, for species that did not contain cpdB-like genes (i.e., those that in Table 6 are indicated with the N key), it can be safely concluded that these organisms cannot explode the CpdB-like protein-dependent strategy of degrading extracellular cyclic dinucleotides recognized as PAMPs by the infected host [9,10], or of interfering with the complement system [14]. Of course, it is possible that other proteins replace CpdB-like ones. For instance, this occurs in the Mycobacterium tuberculosis that is negative for cpdB-like genes ( Table 6), but expresses a pro-virulent cyclic nucleotide phosphodiesterase, encoded by the Rv2837c or cnpB gene, which inhibits innate immune cytosolic surveillance [19,71]. Incidentally, this M. tuberculosisis protein has been named also CdnP [71], such as the CpdB-like protein of S. agalactiae [9], but its encoding gene was not a hit in the TblastN search run with S. enterica CpdB (Table 6), as they are very different proteins encoded by different genes.
Second, concerning species in which cpdB-like genes were widespread (i.e., those that in Table 6 are indicated with the W key), they constitute a field where the possible role of these genes in virulence can be explored by constructing gene mutants, and testing them in suitable infection systems in comparison with wildtype bacteria, or by expressing the encoded proteins and studying their enzyme specificity. By extension of what is known about the provirulent role of cpdB-like genes, and of CpdB-like enzyme activities, in E. coli, S. enterica, S. agalactiae and S. suis [7][8][9][10][11]14], this strategy could be fruitful if applied to other species. For instance, it will be worth exploring genera such as Bacillus, Enterobacter, Haemophilus, Klebsiella, Morganella, Pasteurella, Proteus, Providencia, Serratia, Shigella and Yersinia, among others, which contain cpdB-like genes aligning with high TblastN scores with S. enterica CpdB (Table 6).
Third, particularly interesting are species with a partial distribution of cpdB-like genes, indicative that different strains or isolates differ in this concern. This occured very markedly in pathogens like S. enterica subsp. enterica ser. Typhimurium, Streptococcus dysgalactiae and Vibrio cholerae, to mention those that gave higher TblastN scores for alignment with S. enterica CpdB ( Table 6). In this case, one should consider whether the presence or absence of a cpdB-like gene could modulate the virulence of pathogen strains or isolates.
Another interesting observation from Table 6 is that species of the same genus may differ drastically in the content of cpdB-like genes. This was the case for genus Streptococcus, since all the genomes of S. agalactiae, Streptococcus sanguinis, S. suis (with one exception) and S. thermophilus contained a cpdB-like gene, but those of Streptococcus mitis, Streptococcus mutans, Streptococcus pneumoniae and Streptococcus pyogenes did not, and those of S. dysgalactiae and Streptococcus parasuis showed a partial distribution. This was confirmed by repeating the TblastN searches using S. suis SntA as the query: scores higher than those shown in Table 6 (S. enterica CpdB as the query) were obtained, but the distribution of sntA-like genes was the same as in Table 6 for every Streptococcus species. Another example worthy of comment are the TblastN results with genus Salmonella, much more homogeneous in their content of cpdB-like genes, which were widespread in S. bongori and in S. enterica subspecies arizonae, diarizonae, houtenau, salamae VII, and enterica serovar Typhi, while it was markedly partial in serovar Typhimurium. Concerning genus Escherichia, the presence of cpdB-like genes was almost constant, and only a very minor proportion of E. coli genomes (0.3%) lack them.

Anecdotal Findings of cpdB-like Genes Outside Eubacteria Chromosomal Genomes
Incidentally, besides the findings summarized in Tables 2-6 for chromosomal genomes, we also observed the presence of cpdB-like genes in some unexpected genomic locations, including sequences from: plasmids, Viruses (NCBI:txid10239) and Eukaryota (NCBI:txid2759). To analyze this as deeply as possible, different ad hoc TblastN searches were ran in the NCBI Nucleotide (nr/nt) database with S. enterica CpdB as the query, as described below.
Within the superkingdom Archaea, the TblastN run without any limits, other than the taxonomical one, gave no hits with score > 150.
Within the superkingdom Bacteria, applying the Entrez query "plasmid[Title]", 41 plasmid sequences containing cpdB-like genes with scores ranging 1303-169 were recovered (Table S1). Seven of these hits showed TblastN scores > 1000, with 100% query coverage and >88% identity, and pertained to bacterial species Salmonella sp., Klebsiella pneumoniae, and E. coli. Hits with lower scores corresponded to many different bacteria. The finding of cpdB-like genes in plasmids is theoretically in agreement with the protective character of CpdB-like enzymes against innate immune responses of the host [72].
Within the superkingdom Viruses, the TblastN run without any limits, other than the taxonomical one, gave two hits (Table S2), one of them with a score of 1062 corresponding to a cpdB-like gene of an unclassified bacteriophage of family Myoviridae [73,74].
Within the superkingdom Eukaryota, the TblastN run without any limits, other than the taxonomical one, gave 4 hits (Table S3). Three of them, with score 1185, corresponded to the genome of Digitaria exilis, a nutritious cereal known as white fonio that constitutes a vital crop of West Africa [75]. The fourth one, with score 457, corresponded to the genome of Leishmania major, a protozoan parasite with the ability to invade macrophages and that causes cutaneous leishmaniasis [76,77].
The presence of cpdB-like genes in plasmids, viruses and, particularly, in a higher plant or in a parasitic protozoan is intriguing. One wonders, for instance, whether CpdB-like proteins could have in Leishmania the same protective effect versus the immune system of the infected host as they display in Bacteria.

Cloning, Expression and Purification of Recombinant CpdB
Genomic DNA of S. enterica subsp. enterica serovar Typhimurium strain LT2 was acquired from the Colección Española de Cultivos Tipo (CECT, Valencia, Spain). The coding sequence of the mature CpdB protein without signal sequence (GenBank accession number NC_003197.2:c4639575-4641461) was amplified with primers CACTGGGGATCCGCCACCGTCGATCTCCGTATCATGG (forward) and CTGCACGAATTCTTACTTGCTTAAATCCACCTG (reverse), containing, respectively, BamHI and EcoRI sites (underlined). The amplicon was expected to contain the desired coding sequence flanked by those restriction sites. It was obtained with the Advantage cDNA polymerase mix (Clontech), so it contained 3 A extensions that allowed for T4 DNA ligation to the 3 T extensions of the pGEM-T Easy vector (Promega). Transformation of competent JM109 cells (Promega) yielded white colonies from where plasmids were obtained (High Pure Plasmid Isolation Kit, Roche). After identification of the correct construct by sequencing, it was cut with BamHI and EcoRI, and the passenger was inserted into the corresponding sites of the pGEX-6P-3 vector in frame with the PreScission protease cut sequence and the glutathione-S-transferase (GST) label. The resulting construct (pGEX-6P-3-S.enter_cpdB) was analyzed by double-strand sequencing (Servicio de Genómica, Instituto de Investigaciones Biomédicas Alberto Sols, Consejo Superior de Investigaciones Científicas, Universidad Autónoma, Madrid). The sequence of the insert matched exactly the genomic coding sequence.
The expression and purification of the recombinant CpdB was performed as described [7]. In brief: BL-21 cells were transformed with pGEX-6P-3-S.enter_cpdB under ampicillin selection; transformed cells were cultured in suspension, induced by IPTG and collected by centrifugation. After IPTG induction of the tac promoter of the vector, the supernatant of the BL-21 cell lysate was used for purification of the GST fusion protein by affinity chromatography on GSH-Sepharose (GE Healthcare Life Sciences) followed by separation from the GST label by specific proteolysis with the PreScission protease (GE Healthcare Life Sciences). This yielded mature CpdB with a GPLGS N-terminal extension, with a purity of 80-85% estimated by SDS gel electrophoresis and image analysis [78].

Enzymatic Assays
All the reactions assayed involved the phosphohydrolysis of monoester, diester or anhydride phosphate linkages. Enzyme incubations were carried out in mixtures containing 50 mM Tris-HCl, pH 7.5 at 37 • C, 2 mM MnCl 2 , 0.1 mg mL −1 bovine serum albumin, diverse concentrations of substrate and recombinant enzyme. All the incubations were carried out at 37 • C, under conditions of linearity with respect to incubation length and amount of enzyme. Enzyme-less and/or substrate-less controls were run and subtracted from results of full reaction mixtures. In assays with nucleoside-mono-, di-and triphosphates, and 4-NPhP, the amount of phosphate liberated as a product was measured colorimetrically. For assays with cyclic mononucleotides, diadenosine-oligophosphates, NDP-sugars, CDPcholine, and bis-4NPhP as substrates, an excess of alkaline phosphatase was included in the reaction mixture to liberate phosphate from the reaction products. The colorimetric assay of inorganic phosphate is described elsewhere [7]. The hydrolytic reactions of linear or cyclic oligonucleotides were studied by HPLC monitored at 260 nm, and reaction products separated from substrates were quantitated, as described below.
The hydrolysis of pApA, c-di-AMP, c-tetra-AMP and c-hexa-AMP was assayed measuring the accumulation of 5 -AMP (respectively, 2 mol, 2 mol, 4 mol or 6 mol per mole of substrate), except for the hydrolysis of c-hexa-AMP by S. enterica CpdB, which was assayed measuring substrate consumption. This avoided complications due to the formation of linear oligonucleotides during the reaction progress towards the formation of 5 -AMP. This is in contrast with the behavior of S. suis SntA that gave 5 -AMP as the only detectable product, indicating that linear oligonucleotide intermediates were rapidly hydrolyzed. The hydrolysis of pGpG was assayed measuring the accumulation of GpG, 5 -GMP and guanosine. The hydrolysis of 3 ,3 -cGAMP was assayed measuring the accumulation of adenosine and guanosine, indicating that 3 -nucleotide products were rapidly dephosphorylated. The very slow hydrolysis of 2 ,3 -cGAMP was assayed by the formation of a not well characterized product tentatively identified as G2 p5 A.
Saturation kinetics were studied by measuring initial rates at diverse substrate concentrations. Kinetic parameters k cat and K M were estimated by nonlinear regression as described [10]. The catalytic efficiency equals the quotient k cat /K M , but when saturation plots could not be obtained, this parameter was derived from initial rate data obtained at substrate concentrations much lower than the K M , i.e., when the initial rate is practically proportional to substrate concentration and most part of the enzyme is in free form. Under these conditions k cat /K M = v/([E][S]), [E] being the total concentration of enzyme [79].

HPLC Methods
The HPLC analyses were run on a Tracer Excel 120 column (150 mm × 4 mm) protected by a pre-column (10 mm × 4 mm) of the same material (octadecylsilica; Teknokroma, San Cugat del Vallés, Barcelona). An HP1100 system was used with a diode array detector adjusted to measure A 260 . Samples of 20 µL were injected and the elution was performed at 1 mL/min with four different buffers: A, 5 mM Na-phosphate, pH 7.0, 5 mM tetrabutylammonium, 20% methanol (by vol.); B, 100 mM Na-phosphate, pH 7.0, 5 mM tetrabutylammonium, 20% methanol; C, 10 mM Na-phosphate pH 7.0; D, 10 mM Na-phosphate pH 7.0; 50% methanol. The buffers used and the gradient method applied depended on the enzymatic reaction being studied, as indicated below.
To analyze the hydrolysis of pApA, the initial mobile phase was 80% A, 20% B, and a linear gradient was applied up to 100% B in 10 min.
To analyze the hydrolysis of pGpG, the initial mobile phase was 100% A, and a linear gradient was applied up to 40% A and 60% B in 10 min, followed by another linear gradient up to 30% A and 70% B in 5 min.
To analyze the hydrolysis of c-di-AMP, the initial mobile phase was 100% A, and a linear gradient was applied up to 40% A and 60% B in 5 min, followed by another linear gradient up to 30% A and 70% B in 10 min.
To analyze the hydrolysis of c-di-GMP, the initial mobile phase was 80% A, 20% B, and a linear gradient was applied up to 40% A and 60% B in 10 min.
To analyze the hydrolysis of c-tetra-AMP and c-hexa-AMP, the initial mobile phase was 100% C, and a linear gradient was applied up to 100% D in 10 min.
To analyze the hydrolysis of 2 ,3 -cGAMP and 3 ,3 -cGAMP, the initial mobile phase was 100% A, and a linear gradient was applied up to 50% A and 50% B in 4 min, followed by another linear gradient up to 100% B in 1 min and isocratic elution with 100%B for 4 min.

Analyses of the Genomic Distribution of cpdB-like Genes
To perform a systematic study of the distribution of cpdB-like genes in the Eubacteria superkingdom, we chose to use the protein sequence of S. enterica CpdB as query (GenBank accession number P26265), and searched for genomic sequences that, after translation, give significant alignments. The TblastN tool of NCBI was used to this end [64,65]. In the absence of any limits imposed on the search, the complexity of the results would be overwhelming due to the large numbers of microbial genomes currently stored in GenBank for a single species. For instance, a TblastN run with query P26265 against complete genomes of S. suis (158 available on 14 December 2022), without any other limit imposed than the organism Taxonomy ID, hits on 127 sequences with scores 409-549 and very significant E values of 10 −144 and lower. The abundance of sequences for a single species precludes obtaining an unbiased panorama of the genomic distribution in Eubacteria, particularly because species with many genome sequences would be overrepresented in the final panorama. To avoid this bias, and to obtain results in which each species is represented by a single or a small number of genomes, the ability to limit the search to "sequences from type material" was activated in the microbial TblastN launch page [68]. In addition, the Genome database contains many plasmid sequences that count as complete genomes of the host bacterial species. To eliminate these "false" genomes from the TblastN results, the Entrez query "NOT plasmid[Title]" was added to the search. Further to the above strategy, to compute the results obtained in each taxonomy group by alignment to P26265, we imposed that the alignment score should be >150 and the query coverage >70%.
In cases that the above search strategy gave 0-2 hits, and in other selected cases, the search was repeated after removing the limit "sequences from type material" (Tables 2-5). This was also done for analyses of individual species, genus of specific groups ( Table 6). In special cases (Tables S1-S3), the TblastN search was run in the NCBI Nucleotide database (nr/nt).

Conclusions
S. enterica CpdB is a phosphohydrolase of broad specificity with a remarkable selectivity for some substrates versus others. Cyclic and linear dinucleotides are hydrolyzed with efficiencies of 10 4 M −1 s −1 to 10 7 M −1 s −1 , which makes CpdB capable of acting on these bacterial regulators in the extracytoplasmic media of bacteria, and in the context of host-pathogen interaction.
S. enterica CpdB and S. suis SntA hydrolyze, albeit with lower efficiencies, the cyclic oligoadenylates c-tetra-AMP and c-hexa-AMP, and so these enzymes add to the shortlist of phosphodiesterases able to degrade these novel bacterial second messengers.
In the genomes of superkingdom Bacteria, cpdB-like genes are far from ubiquitous, as they are present in some phyla but not in others.
Within phyla containing cpdB-like genes, their distribution is not homogeneous, since at any taxonomical level above species there are few taxa containing a cpdB-like gene in all the sequenced genomes.
At the level of species, the distribution is more homogeneous, as out of 77 taxa investigated, 38 show a widespread or near widespread distribution of cpdB-like genes, 11 show a partial distribution, and 28 do not contain cpdB-like genes.
Species that do not contain cpdB-like genes cannot explode the CpdB-like proteindependent strategy of degrading extracellular cyclic dinucleotides recognized as PAMPs by the infected host.
Species in which cpdB-like genes are widespread constitute a field where the possible role of these genes in virulence can be explored by creating gene mutants and studying the enzyme specificity of the CpdB-like protein.
In species with a partial distribution of cpdB-like genes, the presence or absence of a cpdB-like gene could modulate the virulence of pathogen strains or isolates.

Informed Consent Statement: Not applicable.
Data Availability Statement: The supporting raw data may be requested from the corresponding author without reservation.